Search CORE

105 research outputs found

Recommended from our members

Factors in human recognition of timbre lexicons generated by data clustering

Author: Herrera Perfecto
Laney Robin
Roma Gerard
Xambó Anna
Publication venue
Publication date: 12/07/2012
Field of study

Since the development of sound recording technologies, the palette of sound timbres available for music creation was extended way beyond traditional musical instruments. The organization and categorization of timbre has been a common endeavor. The availability of large databases of sound clips provides an opportunity for obtaining datadriven timbre categorizations via content-based clustering. In this article we describe an experiment aimed at understanding what factors influence the process of learning a given clustering of sound samples. We clustered a large database of short sound clips, and analyzed the success of participants in assigning sounds to the “correct” clusters after listening to a few examples of each. The results of the experiment suggest a number of relevant factors related both to the strategies followed by users and to the quality measures of the clustering solution, which can guide the design of creative applications based on audio clip clustering

Open Research Online (The Open University)

Deep Remix: Remixing Musical Mixtures Using a Convolutional Deep Neural Network

Author: Plumbley Mark D.
Roma Gerard
Simpson Andrew J. R
Publication venue
Publication date: 01/05/2015
Field of study

Audio source separation is a difficult machine learning problem and performance is measured by comparing extracted signals with the component source signals. However, if separation is motivated by the ultimate goal of re-mixing then complete separation is not necessary and hence separation difficulty and separation quality are dependent on the nature of the re-mix. Here, we use a convolutional deep neural network (DNN), trained to estimate 'ideal' binary masks for separating voice from music, to perform re-mixing of the vocal balance by operating directly on the individual magnitude components of the musical mixture spectrogram. Our results demonstrate that small changes in vocal gain may be applied with very little distortion to the ultimate re-mix. Our method may be useful for re-mixing existing mixes

arXiv.org e-Print Archive

Surrey Research Insight

Hyperconnected Action Painting

Author: Roma Gerard
Xambó Anna
Publication venue
Publication date: 30/09/2017
Field of study

This performance invites the audience to participate in an immersive experience using their mobile devices. The aim is at capturing their actions on a digital painting inspired by Jackson Pollock’s action painting technique. The audience is connected to a wireless network and a Web Audio application that recognizes a number of gestures through the mobile accelerometer sensor, which trigger different sounds. Gestures will be recognized and mapped to a digital canvas. A set of loudspeakers will complement the audience’s actions with ambient sounds. The performance explores audio spatialization using both loudspeakers and mobile phone speakers, that combined with the digital painting provides an immersive audiovisual experience. The final digital canvas will be available online as a memory of the performance

Queen Mary Research Online

Performing Audiences: Composition Strategies for Network Music using Mobile Phones

Author: Roma Gerard
Xambo Anna
Publication venue: Birmingham City University
Publication date: 30/03/2020
Field of study

With the development of web audio standards, it has quickly become technically easy to develop and deploy software for inviting audiences to participate in musical performances using their mobile phones. Thus, a new audience-centric musical genre has emerged, which aligns with artistic manifestations where there is an explicit inclusion of the public (e.g. participatory art, cinema or theatre). Previous research has focused on analysing this new genre from historical, social organisation and technical perspectives. This follow-up paper contributes with reflections on technical and aesthetic aspects of composing within this audience-centric approach. We propose a set of 13 composition dimensions that deal with the role of the performer, the role of the audience, the location of sound and the type of feedback, among others. From a reflective approach, four participatory pieces developed by the authors are analysed using the proposed dimensions. Finally, we discuss a set of recommendations and challenges for the composers-developers of this new and promising musical genre. This paper concludes discussing the implications of this research for the NIME community

De Montfort University Open Research Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Huddersfield Research Portal

Loop-aware Audio Recording for the Web

Author: Freeman Jason
Roma Gerard
Xambó Anna
Publication venue
Publication date: 25/09/2017
Field of study

Music loops are audio recordings used as basic building blocks in many types of music. The use of pre-recorded loops facilitates engagement into music creation to users regardless of their background in music theory. Using online loop databases also affords simple collaboration and exchange. Hence, music loops are particularly attractive for web audio applications. However, traditional musical audio recording typically relies on complex DAW software. Recording loops usually requires consideration of musical meter and tempo, and withstanding metronome sounds. In this paper, we propose loop-aware audio recording as a use case for web audio technologies. Our approach supports hands-free, low-stress recording of music loops in web- enabled devices. The system is able to detect repetitions in an incoming audio stream. Based on this information, it segments and ranks the repeated fragments, presenting the list to the user. We provide an example implementation, and evaluate the use of the different MIR libraries available in the web audio platform for the proposed task

Queen Mary Research Online

Algorithms and representations for supporting online music creation with large-scale audio databases

Author: Roma Trepat Gerard
Publication venue: 'Universitat Pompeu Fabra'
Publication date: 01/01/2015
Field of study

The rapid adoption of Internet and web technologies has created an opportunity for making music collaboratively by sharing information online. However, current applications for online music making do not take advantage of the potential of shared information. The goal of this dissertation is to provide and evaluate algorithms and representations for interacting with large audio databases that facilitate music creation by online communities. This work has been developed in the context of Freesound, a large-scale, community-driven database of audio recordings shared under Creative Commons (CC) licenses. The diversity of sounds available through this kind of platform is unprecedented. At the same time, the unstructured nature of community-driven processes poses new challenges for indexing and retrieving information to support musical creativity. In this dissertation we propose and evaluate algorithms and representations for dealing with the main elements required by online music making applications based on large-scale audio databases: sound files, including time-varying and aggregate representations, taxonomies for retrieving sounds, music representations and community models. As a generic low-level representation for audio signals, we analyze the framework of cepstral coefficients, evaluating their performance with example classification tasks. We found that switching to more recent auditory filter such as gammatone filters improves, at large scales, on traditional representations based on the mel scale. We then consider common types of sounds for obtaining aggregated representations. We show that several time series analysis features computed from the cepstral coefficients complement traditional statistics for improved performance. For interacting with large databases of sounds, we propose a novel unsupervised algorithm that automatically generates taxonomical organizations based on the low-level signal representations. Based on user studies, we show that our approach can be used in place of traditional supervised classification approaches for providing a lexicon of acoustic categories suitable for creative applications. Next, a computational representation is described for music based on audio samples. We demonstrate through a user experiment that it facilitates collaborative creation and supports computational analysis using the lexicons generated by sound taxonomies. Finally, we deal with representation and analysis of user communities. We propose a method for measuring collective creativity in audio sharing. By analyzing the activity of the Freesound community over a period of more than 5 years, we show that the proposed creativity measures can be significantly related to social structure characterized by network analysis.La ràpida adopció dInternet i de les tecnologies web ha creat una oportunitat per fer música col•laborativa mitjançant l'intercanvi d'informació en línia. No obstant això, les aplicacions actuals per fer música en línia no aprofiten el potencial de la informació compartida. L'objectiu d'aquesta tesi és proporcionar i avaluar algorismes i representacions per a interactuar amb grans bases de dades d'àudio que facilitin la creació de música per part de comunitats virtuals. Aquest treball ha estat desenvolupat en el context de Freesound, una base de dades d'enregistraments sonors compartits sota llicència Creative Commons (CC) a gran escala, impulsada per la comunitat d'usuaris. La diversitat de sons disponibles a través d'aquest tipus de plataforma no té precedents. Alhora, la naturalesa desestructurada dels processos impulsats per comunitats planteja nous reptes per a la indexació i recuperació d'informació que dona suport a la creativitat musical. En aquesta tesi proposem i avaluem algorismes i representacions per tractar amb els principals elements requerits per les aplicacions de creació musical en línia basades en bases de dades d'àudio a gran escala: els arxius de so, incloent representacions temporals i agregades, taxonomies per a cercar sons, representacions musicals i models de comunitat. Com a representació de baix nivell genèrica per a senyals d'àudio, s'analitza el marc dels coeficients cepstrum, avaluant el seu rendiment en tasques de classificació d'exemple. Hem trobat que el canvi a un filtre auditiu més recent com els filtres de gammatons millora, a gran escala, respecte de les representacions tradicionals basades en l'escala mel. Després considerem tres tipus comuns de sons per a l'obtenció de representacions agregades. Es demostra que diverses funcions d'anàlisi de sèries temporals calculades a partir dels coeficients cepstrum complementen les estadístiques tradicionals per a un millor rendiment. Per interactuar amb grans bases de dades de sons, es proposa un nou algorisme no supervisat que genera automàticament organitzacions taxonòmiques basades en les representacions de senyal de baix nivell. Em base a estudis amb usuaris, mostrem que el sistema proposat es pot utilitzar en lloc dels sistemes tradicionals de classificació supervisada per proporcionar un lèxic de categories acústiques adequades per a aplicacions creatives. A continuació, es descriu una representació computacional per a música creada a partir de mostres d'àudio. Demostrem a través d'un experiment amb usuaris que facilita la creació col•laborativa i dóna suport l'anàlisi computacional usant els lèxics generats per les taxonomies de so. Finalment, ens centrem en la representació i anàlisi de comunitats d'usuaris. Proposem un mètode per mesurar la creativitat col•lectiva en l'intercanvi d'àudio. Mitjançant l'anàlisi de l'activitat de la comunitat Freesound durant un període de més de 5 anys, es mostra que les mesures proposades de creativitat es poden relacionar significativament amb l'estructura social descrita mitjançant l'anàlisi de xarxes.La rápida adopción de Internet y de las tecnologías web ha creado una oportunidad para hacer música colaborativa mediante el intercambio de información en línea. Sin embargo, las aplicaciones actuales para hacer música en línea no aprovechan el potencial de la información compartida. El objetivo de esta tesis es proporcionar y evaluar algoritmos y representaciones para interactuar con grandes bases de datos de audio que faciliten la creación de música por parte de comunidades virtuales. Este trabajo ha sido desarrollado en el contexto de Freesound, una base de datos de grabaciones sonoras compartidos bajo licencia Creative Commons (CC) a gran escala, impulsada por la comunidad de usuarios. La diversidad de sonidos disponibles a través de este tipo de plataforma no tiene precedentes. Al mismo tiempo, la naturaleza desestructurada de los procesos impulsados por comunidades plantea nuevos retos para la indexación y recuperación de información en apoyo de la creatividad musical. En esta tesis proponemos y evaluamos algoritmos y representaciones para tratar con los principales elementos requeridos por las aplicaciones de creación musical en línea basadas en bases de datos de audio a gran escala: archivos de sonido, incluyendo representaciones temporales y agregadas, taxonomías para buscar sonidos, representaciones musicales y modelos de comunidad. Como representación de bajo nivel genérica para señales de audio, se analiza el marco de los coeficientes cepstrum, evaluando su rendimiento en tareas de clasificación. Encontramos que el cambio a un filtro auditivo más reciente como los filtros de gammatonos mejora, a gran escala, respecto de las representaciones tradicionales basadas en la escala mel. Después consideramos tres tipos comunes de sonidos para la obtención de representaciones agregadas. Se demuestra que varias funciones de análisis de series temporales calculadas a partir de los coeficientes cepstrum complementan las estadísticas tradicionales para un mejor rendimiento. Para interactuar con grandes bases de datos de sonidos, se propone un nuevo algoritmo no supervisado que genera automáticamente organizaciones taxonómicas basadas en las representaciones de señal de bajo nivel. En base a estudios con usuarios, mostramos que nuestro enfoque se puede utilizar en lugar de los sistemas tradicionales de clasificación supervisada para proporcionar un léxico de categorías acústicas adecuadas para aplicaciones creativas. A continuación, se describe una representación computacional para música creada a partir de muestras de audio. Demostramos, a través de un experimento con usuarios, que facilita la creación colaborativa y posibilita el análisis computacional usando los léxicos generados por las taxonomías de sonido. Finalmente, nos centramos en la representación y análisis de comunidades de usuarios. Proponemos un método para medir la creatividad colectiva en el intercambio de audio. Mediante un análisis de la actividad de la comunidad Freesound durante un periodo de más de 5 años, se muestra que las medidas propuestas de creatividad se pueden relacionar significativamente con la estructura social descrita mediante análisis de redes

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Representing music as work in progress

Author: Gerard Roma
Perfecto Herrera
Publication venue
Publication date: 03/04/2020
Field of study

ABSTRACT In this chapter we discuss an approach to music representation that supports collaborative composition given current practices based on digital audio. A music work is represented as a directed graph that encodes sequences and layers of sound objects. We discuss graph grammars as a general framework for this representation. From a grammar perspective, we analyze the use of XML for storing production rules, music structure, and references to audio files. We describe an example implementation of this approach

CiteSeerX

Live Repurposing of Sounds:MIR Explorations with Personal and Crowdsourced Databases

Author: Barthet Mathieu
Fazekas György
Lerch Alexander
Roma Gerard
Xambó Anna
Publication venue
Publication date: 18/06/2018
Field of study

Huddersfield Research Portal

Recommended from our members

SoundXY4: supporting tabletop collaboration and awareness with ambisonics spatialisation

Author: Dobbyn Chris
Jordà Sergi
Laney Robin
Roma Gerard
Xambó Anna
Publication venue: Goldsmiths University of London
Publication date: 01/01/2014
Field of study

Co-located tabletop tangible user interfaces (TUIs) for music performance are known for promoting multi-player collaboration with a shared interface, yet it is still unclear how to best support the awareness of the workspace in terms of understanding individual actions and the other group members actions, in parallel. In this paper, we investigate the effects of providing auditory feedback using ambisonics spatialisation, aimed at informing users about the location of the tangibles on the tabletop surface, with groups of mixed musical backgrounds. Participants were asked to improvise music on "SoundXY4: The Art of Noise", a tabletop system that includes sound samples inspired by Russolo's taxonomy of noises. We compared spatialisation vs. no-spatialisation conditions, and findings suggest that, when using spatialisation, there was a clearer workspace awareness, and a greater engagement in the musical activity as an immersive experience

Open Research Online (The Open University)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Guest Editors' Note Special Issue on Web Audio

Author: Martin Sara Regina
Roma Gerard
Xambo Anna
Publication venue: 'Audio Engineering Society'
Publication date: 01/10/2020
Field of study

De Montfort University Open Research Archive

Huddersfield Research Portal